Introduction

             Alternative splicing represents an additional and under-appreciated layer of complexity underlying gene expression profiles. More recently, technological advances in library preparation methodologies enabled capturing and amplification of full-length cDNAs from single cells. Thus, paving the way for splicing analysis at single-cell resolution.

             Nevertheless, single-cell splicing analysis comes with its own set of challenges including, but not limited to, low coverage of lowly-expressed genes, low capture rate of cDNA molecules, and amplification bias (Wen et al., 2020). To date, there remains a paucity of peer-reviewed softwares available for single-cell splicing analysis. Notable examples are BRIE (Huang & Sanguinetti, 2017) and Expedition (Song et al., 2017).

             Here, we introduce MARVEL (Modality Assessment to ReVEaL alternative splicing dynamics at single-cell resolution). MARVEL includes features that complement existing softwares in order to more comprehensively describe and reveal splicing dynamics at single-cell resolution. This tutorial focues on using MARVEL to reveal biological insights using single-cell alternative splicing analysis.

             The main functionalities of MARVEL are:
(1) Compute PSI values for all five main exon-level splicing events, i.e. skipped-exon (SE), mutually-exclusive exons (MXE), retained-intron (RI), alternative 5’ splice site (A5SS), and alternative 3’ splice site (A3SS).
(2) Stratify PSI distribution for each splicing event into the five main modalities, i.e. included, excluded, bimodal, middle, and multimodal. Further stratify included and excluded into primary and dispersed sub-modalities.
(3) Perform differential splicing analysis and identify network of genes which are coordinately spliced.
(4) Integrate both splicing and gene expression data to compare and contrast splicing and gene expression profiles.

Installation

Install package from Github
library(devtools)
install_github("wenweixiong/MARVEL")
Or install package from CRAN
install.packages("MARVEL")
Load MARVEL package and other packages required from this tutorial
library(MARVEL)
library(data.table)    # Fast read-in of input files using fread() function
library(pdp)           # Arranging output plots using grid.arrange() function

Dataset

             In the examples that follow, we will use published single induced pluripotent stem cells (iPSC) and endoderm cells. Single cells were prepared using Smart-seq2 and sequenced on HiSeq2000 on 125-bp paired-end (PE) mode (Linker et al., 2018).
             The data used for this tutorial can be downloaded from here: http://userweb.molbiol.ox.ac.uk/public/wwen/MARVEL_Tutorial_Data.zip

Create MARVEL object

Splicing nomenclature

             Practising accurate description of splicing nomenclature ensures reproducible analysis. Splicing nomenclature is simply the splicing event’s unique identifier (ID). In this tutorial, the splicing event ID should be indicated under the tran_id column of the relevant metadata. For detailed information of splicing nomenclature, please visit https://wenweixiong.github.io/Splicing_Nomenclature.

Splicing data

             MARVEL uses S3 object-oriented system (OOS) to allow convenient data manipulation by users. Therefore, the first step is to create the S3 object. We will use CreateMarvelObject to create our S3 object. Arguments required are:
(1) SplicePheno Sample metadata. Mandatory columns are sample.id and cell.type.
(2) SpliceJunction Splice junction counts.
(3) SpliceFeature Splicing event metadata. Each element in the list represents a data frame corresponding to a specific splicing event type. Names of each element in the list should reflect the splicing event type, i.e. SE, MXE, RI, A5SS, and A3SS. Mandatory columns are tran_id and gene_id.
             It is noteworthy that MARVEL is agnostic with regards to splice junction and splicing event detection. Hence, users have the freedom to use their preferred softwares to detect splicing events and splice junctions. Here, splice junction counts and splicing events were generated using STAR (Dobin et al., 2013) and Stringtie2-rMATS (Kovaka et al., 2019; Shen et al., 2014), respectively.
# Read sample metadata file
path_to_file <- "Data/SJ_phenoData.txt"
df.pheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA")

# Subset good-quality samples
df.pheno <- df.pheno[which(df.pheno$sample.type=="Single Cell" & df.pheno$qc.seq=="pass" & df.pheno$cell.type != "Unknown"), ]

df.pheno[1:5,]
##    sample.id cell.type sample.type qc.seq
## 2 ERR1562084      iPSC Single Cell   pass
## 3 ERR1562085      iPSC Single Cell   pass
## 4 ERR1562086      iPSC Single Cell   pass
## 5 ERR1562087      iPSC Single Cell   pass
## 6 ERR1562088      iPSC Single Cell   pass
# Read splice junction file
path_to_file <- "Data/SJ.txt"
sj <- as.data.frame(fread(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA"))

sj[7000:7005,1:5]
##                  coord.intron ERR1562083 ERR1562084 ERR1562085 ERR1562086
## 7000 chr1:114138054:114139614          1          5          1          5
## 7001 chr1:114139881:114142686         NA         NA         NA         NA
## 7002 chr1:114139881:114149666         NA         NA         NA         NA
## 7003 chr1:114139881:114152211         NA         NA         NA         NA
## 7004 chr1:114139881:114152217         NA         NA         NA         NA
## 7005 chr1:114139881:114152765          1          2          1         NA
# Read splicing event metadata files
event.types <- c("SE", "MXE", "RI", "A5SS", "A3SS")

df.feature.list <- list()

for(i in 1:length(event.types)) {

    path_to_file <- paste("Data/", event.types[i], "_featureData.txt", sep="")
    df.feature.list[[i]] <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA")

}

names(df.feature.list) <- event.types

df.feature.list[[1]][1:5, ]
##                                                                          tran_id
## 1 chrX:156022314:156022459:+@chrX:156022699:156022834:+@chrX:156023012:156023209
## 2 chrX:154227192:154227360:+@chrX:154228828:154228993:+@chrX:154230548:154230598
## 3 chrX:154190054:154190222:+@chrX:154191688:154191853:+@chrX:154193408:154193458
## 4 chrX:154152940:154153108:+@chrX:154154574:154154739:+@chrX:154156294:154156344
## 5 chrX:152918983:152919081:+@chrX:152920328:152920411:+@chrX:152920707:152920748
##              gene_id gene_short_name                          gene_type
## 1 ENSG00000182484.15          WASH6P transcribed_unprocessed_pseudogene
## 2  ENSG00000166160.9         OPN1MW2                     protein_coding
## 3  ENSG00000268221.5          OPN1MW                     protein_coding
## 4  ENSG00000102076.9          OPN1LW                     protein_coding
## 5 ENSG00000147394.18          ZNF185                     protein_coding
# Create MARVEL object
marvel <- CreateMarvelObject(
            SplicePheno=df.pheno,          # Sample metadata
            SpliceJunction=sj,             # Splice junction counts 
            SpliceFeature=df.feature.list  # Splicing event metadata
            )

Gene expression data

             MARVEL can also include gene expression data into its object. In fact, this is highly recommended as we will be comparing and contrasting splicing and gene expression patterns later to illustrate additional complexities revealed by splicing in addition to gene expression data. Gene expression values should be normalised (TPM/FPKM/RPKM) and log-transformed prior to creating the MARVEL object. Here, we used RSEM to derive TPM values (Li & Dewey, 2011). Similar to splicing data above, three additional arguments are required:
(1) GenePheno Sample metadata. Mandatory columns are sample.id and cell.type.
(2) GeneFeature Gene metadata. Mandatory column is gene_id.
(3) Exp Normalised and log-transformed gene expression matrix.
# Sample metadata
path_to_file <- "Data/TPM_phenoData.txt"
df.tpm.pheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)
df.tpm.pheno[1:5, ]
##    sample.id cell.type sample.type qc.seq
## 1 ERR1562083   Unknown Single Cell   pass
## 2 ERR1562084      iPSC Single Cell   pass
## 3 ERR1562085      iPSC Single Cell   pass
## 4 ERR1562086      iPSC Single Cell   pass
## 5 ERR1562087      iPSC Single Cell   pass
# Gene metadata
path_to_file <- "Data/TPM_featureData.txt"
df.tpm.feature <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)
df.tpm.feature[1:5, ]
##              gene_id gene_short_name      gene_type
## 1 ENSG00000000003.14          TSPAN6 protein_coding
## 2  ENSG00000000005.6            TNMD protein_coding
## 3 ENSG00000000419.12            DPM1 protein_coding
## 4 ENSG00000000457.14           SCYL3 protein_coding
## 5 ENSG00000000460.17        C1orf112 protein_coding
# Matrix
path_to_file <- "Data/TPM.txt"
df.tpm <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE)
df.tpm[1:5,1:5]
##              gene_id ERR1562083 ERR1562084 ERR1562085 ERR1562086
## 1 ENSG00000000003.14     343.26     163.45     210.43     190.46
## 2  ENSG00000000005.6       5.69       0.00       0.00       0.00
## 3 ENSG00000000419.12     288.02     155.26      42.49     238.67
## 4 ENSG00000000457.14       1.58       8.71       0.00       1.06
## 5 ENSG00000000460.17      41.23      28.53     100.17      66.43
# Subset good-quality samples
df.tpm.pheno <- df.tpm.pheno[which(df.tpm.pheno$qc.seq=="pass" & df.tpm.pheno$cell.type != "Unknown"), ]

# Transform values
df.tpm[,-1] <- log2(df.tpm[,-1] + 1)

# Set threshold for values
df.tpm[,-1][df.tpm[,-1] < 1] <- 0

# Create Marvel object
marvel <- CreateMarvelObject(
            SplicePheno=df.pheno,          # Sample metadata
            SpliceJunction=sj,             # Splice junction counts 
            SpliceFeature=df.feature.list, # Splicing event metadata
            GenePheno=df.tpm.pheno,        # Sample metadata
            GeneFeature=df.tpm.feature,    # Gene metadata
            Exp=df.tpm                     # Gene expression matrix
            )

Compute PSI

             Percent spliced-in (PSI) is the fraction of reads supporting the included isoform over the number of reads supporting both included and excluded isoforms. Included isoforms are isoforms that include the alternative exon. Hence, PSI is a measure of alternative exon inclusion (“spliced-in”).

             To date, peer-reviewed softwares focused only on SE events. MARVEL expands exon-level splicing analysis to include all five main exon-level splicing events, namely SE, MXE, RI, A5SS, and A3SS. Other than SE, the other 4 splicing types have been reported to play important roles in both health and disease too. For example, RI is known to regulate gene expression through nonsense-mediated decay (Smart et al., 2018).

             First, MARVEL will cross-check the splicing events with the splice junctions provided. Only splicing events whose splice junctions are found in the splice junction file are retained. This ensures only high-quality splicing events are included for PSI calculation and downstream analysis.

             Furthermore, MARVEL allows users to set a coverage threshold using the CoverageThreshold argument, above which the splicing event is included for PSI calculation. For example, if the coverage threshold is set at 10, then MARVEL will only include the splicing event if all included or excluded junctions involved in calculating the splicing events are supported by at least 10 or more reads. NA in the PSI matrix returned are splicing events whose sample that did not have sufficient reads, i.e. lower number of reads than that specified by the user. The coverage threshold of 10 has been used multiple times in previous single-cell studies for selecting splicing events for downstream analysis (Song et al., 2017; Buen Abad Najar et al, 2020).

             If you would like to use the pre-computed PSI values generated by MARVEL for this tutorial, please head to Pre-computed PSI section.

Skipped-exon (SE)

marvel <- ComputePSI(MarvelObject=marvel, CoverageThreshold=10, EventType="SE")
## [1] "Analysing 51154 splicing events"
## [1] "20509 splicing events validated and quantified"
marvel$SpliceFeatureValidated$SE[1:5, ]
##                                                                           tran_id
## 6  chrX:152922173:152922256:+@chrX:152922720:152922809:+@chrX:152928575:152928661
## 7  chrX:152922173:152922256:+@chrX:152922720:152922809:+@chrX:152931672:152931776
## 8  chrX:152922173:152922256:+@chrX:152928575:152928661:+@chrX:152931672:152931776
## 9  chrX:152922720:152922809:+@chrX:152928575:152928661:+@chrX:152931672:152931776
## 12 chrX:152932873:152932971:+@chrX:152936418:152936513:+@chrX:152938074:152938163
##    event_type            gene_id gene_short_name      gene_type
## 6          SE ENSG00000147394.18          ZNF185 protein_coding
## 7          SE ENSG00000147394.18          ZNF185 protein_coding
## 8          SE ENSG00000147394.18          ZNF185 protein_coding
## 9          SE ENSG00000147394.18          ZNF185 protein_coding
## 12         SE ENSG00000147394.18          ZNF185 protein_coding
marvel$PSI$SE[5000:5005, 1:5]
##                                                                                tran_id
## 5000          chr1:42693490:42693523:+@chr1:42696199:42696288:+@chr1:42696642:42696944
## 5001    chr1:160350105:160350250:+@chr1:160351222:160351372:+@chr1:160351696:160351729
## 5002    chr1:160350105:160350250:+@chr1:160351288:160351372:+@chr1:160351696:160351805
## 5003    chr1:160353160:160353237:+@chr1:160353323:160353501:+@chr1:160354118:160354290
## 5004 chr14:100376439:100376495:+@chr14:100380910:100381746:+@chr14:100468021:100468168
## 5005 chr14:100380910:100381746:+@chr14:100468021:100468168:+@chr14:100483994:100484124
##      ERR1562083 ERR1562084 ERR1562085 ERR1562086
## 5000          1          1         NA          1
## 5001         NA         NA         NA          1
## 5002         NA         NA         NA         NA
## 5003         NA         NA          0         NA
## 5004         NA         NA         NA         NA
## 5005         NA         NA         NA         NA
Validated splicing events and computed PSI values are returned in marvel$SpliceFeatureValidated$SE and marvel$PSI$SE slot, respectively.
marvel$SpliceFeatureValidated$SE[1:5, ]
##                                                                           tran_id
## 6  chrX:152922173:152922256:+@chrX:152922720:152922809:+@chrX:152928575:152928661
## 7  chrX:152922173:152922256:+@chrX:152922720:152922809:+@chrX:152931672:152931776
## 8  chrX:152922173:152922256:+@chrX:152928575:152928661:+@chrX:152931672:152931776
## 9  chrX:152922720:152922809:+@chrX:152928575:152928661:+@chrX:152931672:152931776
## 12 chrX:152932873:152932971:+@chrX:152936418:152936513:+@chrX:152938074:152938163
##    event_type            gene_id gene_short_name      gene_type
## 6          SE ENSG00000147394.18          ZNF185 protein_coding
## 7          SE ENSG00000147394.18          ZNF185 protein_coding
## 8          SE ENSG00000147394.18          ZNF185 protein_coding
## 9          SE ENSG00000147394.18          ZNF185 protein_coding
## 12         SE ENSG00000147394.18          ZNF185 protein_coding
marvel$PSI$SE[5000:5005, 1:5]
##                                                                                tran_id
## 5000          chr1:42693490:42693523:+@chr1:42696199:42696288:+@chr1:42696642:42696944
## 5001    chr1:160350105:160350250:+@chr1:160351222:160351372:+@chr1:160351696:160351729
## 5002    chr1:160350105:160350250:+@chr1:160351288:160351372:+@chr1:160351696:160351805
## 5003    chr1:160353160:160353237:+@chr1:160353323:160353501:+@chr1:160354118:160354290
## 5004 chr14:100376439:100376495:+@chr14:100380910:100381746:+@chr14:100468021:100468168
## 5005 chr14:100380910:100381746:+@chr14:100468021:100468168:+@chr14:100483994:100484124
##      ERR1562083 ERR1562084 ERR1562085 ERR1562086
## 5000          1          1         NA          1
## 5001         NA         NA         NA          1
## 5002         NA         NA         NA         NA
## 5003         NA         NA          0         NA
## 5004         NA         NA         NA         NA
## 5005         NA         NA         NA         NA

Mutually excl. exons (MXE)

marvel <- ComputePSI(MarvelObject=marvel, CoverageThreshold=10, EventType="MXE")
## [1] "Analysing 4892 splicing events"
## [1] "1279 splicing events validated and quantified"
Validated splicing events and computed PSI values are returned in marvel$SpliceFeatureValidated$MXE and marvel$PSI$MXE slot, respectively.

Alt. 5’ splice site (A5SS)

marvel <- ComputePSI(MarvelObject=marvel, CoverageThreshold=10, EventType="A5SS")
## [1] "Analysing 10464 splicing events"
## [1] "5163 splicing events validated and quantified"
Validated splicing events and computed PSI values are returned in marvel$SpliceFeatureValidated$A5SS and marvel$PSI$A5SS slot, respectively.

Alt. 3’ splice site (A3SS)

marvel <- ComputePSI(MarvelObject=marvel, CoverageThreshold=10, EventType="A3SS")
## [1] "Analysing 13160 splicing events"
## [1] "5852 splicing events validated and quantified"
Validated splicing events and computed PSI values are returned in marvel$SpliceFeatureValidated$A3SS and marvel$PSI$A3SS slot, respectively.

Retained-intron (RI)

MARVEL will first cross-check the intron coordinates with all the exons from SE, MXE, A5SS, and A3SS for any overlaps. Only introns that do not overlap with any exons are retained for PSI calculation and downstream analysis. Introns that do not overlap with exons are called independent introns. This strategy has been reported previously to ensure more accurate PSI estimation for introns (Broseus & Ritchie, 2020). To this end, we need to provide MARVEL with the total coverage of each intron to the IntronCountsFile argument. Here, we used Bedtools to compute the total coverage for each intron (Quinlan & Hall, 2010).
# Read per-base intron coverage file
path_to_file <- "Data/Counts_by_Region.txt"
df.intron.counts <- as.data.frame(fread(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA"))
df.intron.counts[1:5, 1:5]
##         coord.intron ERR1562083 ERR1562084 ERR1562085 ERR1562086
## 1   chr1:13375:13452          0          0          0          0
## 2 chr1:146510:146641          0        150          0        129
## 3 chr1:498457:498683        678          0          0          0
## 4 chr1:764801:765143       5375       4544       6794       1591
## 5 chr1:804967:806385        794          0          0          0
# Compute PSI
marvel <- ComputePSI(MarvelObject=marvel, CoverageThreshold=10, IntronCountsFile=df.intron.counts, thread=4, EventType="RI")
## [1] "Analysing 11829 splicing events"
## [1] "8295 splicing events validated and quantified"

Pre-computed PSI

             Users may also use pre-computed PSI values from external softwares or pre-computed values previously generated from MARVEL. The former will be useful when you only wish to use MARVEL for functions other than computing PSI values such as modality assignment and differential splicing analysis. The latter will be useful if you don’t wish to re-compute the PSI values every time you perform your downstream analysis.
# Sample metadata
path_to_file <- "Data/SJ_phenoData.txt"
df.pheno <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA")

# Subset good-quality samples
df.pheno <- df.pheno[which(df.pheno$sample.type=="Single Cell" & df.pheno$qc.seq=="pass" & df.pheno$cell.type != "Unknown"), ]

# List of validated splicing event metadata
event.types <- c("SE", "MXE", "RI", "A5SS", "A3SS")

df.feature.list <- list()

for(i in 1:length(event.types)) {

    path_to_file <- paste("Data/", event.types[i], "_featureData_Validated.txt", sep="")
    df.feature.list[[i]] <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA")

}

names(df.feature.list) <- event.types

# List of pre-computed PSI matrices
event.types <- c("SE", "MXE", "RI", "A5SS", "A3SS")

df.list <- list()

for(i in 1:length(event.types)) {

    path_to_file <- paste("Data/", event.types[i], ".txt", sep="")
    df.list[[i]] <- read.table(path_to_file, sep="\t", header=TRUE, stringsAsFactors=FALSE, na.strings="NA")

}

names(df.list) <- event.types

# Create MARVEL object
marvel <- CreateMarvelObject(
            SplicePheno=df.pheno,                    # Sample metadata
            SpliceFeatureValidated=df.feature.list,  # Validated splicing event metadata
            PSI=df.list,                             # Pre-computed PSI matrices
            GenePheno=df.tpm.pheno,                  # Sample metadata
            GeneFeature=df.tpm.feature,              # Gene metadata
            Exp=df.tpm                               # Gene expression matrix
            )

Clustering analysis

             Alternative splicing represents an additional layer of complexity underlying gene expression profile. If this is true, then splicing data should be able to distinguish between the different cell types when gene expression can’t. RunPCA function performs dimension reduction and visualizes these principle components. Indicate "gene" or "splicing" in the level argument for performing PCA on gene and splicing data, respectively. Results including the PCA coordinates and plots are returned in the marvel$PCA$Gene and marvel$PCA$PSI slot, respectively.

Gene expression profile

DE genes

# Differential gene expression analysis
marvel <- CompareValues(
            MarvelObject=marvel,              # Marvel object
            cell.types=c("iPSC", "Endoderm"), # Cell types to include for analysis
            n.cells=3,                        # Min. no. of cells expression value > 1 required 
            method="wilcox",                  # "wilcox"/"t.test" 
            method.adj="fdr",                 # Adjust for multiple testing as per p.adjust
            level="gene"                      # "gene"/"splicing" data for DE analysis
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$DE$Gene[1:5, ]
##              gene_id gene_short_name      gene_type n.cells.g1 n.cells.g2
## 1  ENSG00000147869.4            CER1 protein_coding          8         53
## 2 ENSG00000183569.18          SERHL2 protein_coding          8         53
## 3 ENSG00000116729.14             WLS protein_coding          9         52
## 4 ENSG00000095596.12         CYP26A1 protein_coding          8         52
## 5  ENSG00000125845.7            BMP2 protein_coding          4         49
##      mean.g1   mean.g2   mean.fc        p.val    p.val.adj
## 1 0.43386686 13.071504 12.637637 9.976934e-27 1.402458e-22
## 2 0.26760689  8.581689  8.314082 2.084395e-26 1.465017e-22
## 3 0.37060976  6.791721  6.421112 1.848477e-25 8.661346e-22
## 4 0.32284501  7.926165  7.603320 2.621966e-25 9.214242e-22
## 5 0.06866666  4.340660  4.271993 8.722613e-25 2.452275e-21
# Reduce dimension using significant gene
  # Subset gene IDs for PCA
  features <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj < 0.1), "gene_id"]
  head(features)
## [1] "ENSG00000147869.4"  "ENSG00000183569.18" "ENSG00000116729.14"
## [4] "ENSG00000095596.12" "ENSG00000125845.7"  "ENSG00000138650.9"
  # PCA
  marvel <- RunPCA(
              MarvelObject=marvel, # Marvel object
              cell.types="all",    # Cell types to include for analysis
              n.cells=3,           # Min. no. of cells expression value > 1 required 
              features=features,   # gene_ids included for PCA
              point.size=2.5,      # Point size on PCA plot
              level="gene"         # "gene"/"splicing" data for PCA analysis
              )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
  # View results
  marvel$PCA$Gene$Results$ind$coord[1:5, 1:5]
##               Dim.1      Dim.2     Dim.3       Dim.4      Dim.5
## ERR1562084 26.35373 -13.406549 -8.341069 -1.07442981 -4.6848342
## ERR1562085 29.78218   2.192240 -1.999656  4.74826248 -2.9679404
## ERR1562086 37.32428   1.297545  4.258658  2.48892029  0.5972728
## ERR1562087 26.68471 -12.754620 -2.475831  0.08248967  3.8975152
## ERR1562088 23.86626  -8.783569 -8.174792 -1.38758293 -3.0523503
  marvel$PCA$Gene$Plot

Non-DE genes

# Reduce dimension using non-significant gene
  # Subset gene IDs for PCA
  features <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj > 0.1), "gene_id"]
  head(features)
## [1] "ENSG00000087095.13" "ENSG00000107201.10" "ENSG00000260916.7" 
## [4] "ENSG00000232533.1"  "ENSG00000183023.18" "ENSG00000132471.12"
  # PCA
  marvel <- RunPCA(
              MarvelObject=marvel,  # Marvel object
              cell.types="all",     # Cell types to include for analysis
              n.cells=3,            # Min. no. of cells expression value > 1 required 
              features=features,    # gene_ids included for PCA
              point.size=2.5,       # Point size on PCA plot
              level="gene"          # "gene"/"splicing" data for PCA analysis
              )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
  # View results
  marvel$PCA$Gene$Results$ind$coord[1:5, 1:5]
##                 Dim.1      Dim.2      Dim.3      Dim.4      Dim.5
## ERR1562084 -0.5247406 -11.907822  3.0675817 -2.8341533 -2.3077158
## ERR1562085  1.2189925   1.010999 -0.8835093 -0.9467759 -3.0654454
## ERR1562086  7.7483364   2.889528  0.5596277 -0.8366935 -2.9239051
## ERR1562087  7.6036138  -8.791249  2.8571604 -2.2145458  0.6518971
## ERR1562088 -3.0081972  -8.870141  3.7967401  0.7924715 -3.5274754
  marvel$PCA$Gene$Plot

             As expected, using differentially expressed (DE) genes led to clear separation of iPSC and endoderm cells. On the other hand, using non-differentially expressed (non-DE) genes did not lead to clear separation of iPSC and endoderm cells. Can splicing events detected from among non-DE genes lead to separation of these two cell types?

PSI profile

Skipped-exon (SE)

# Subset isoform IDs of non-DE genes for PCA
gene_ids <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj > 0.1), "gene_id"]
features <- sapply(marvel$SpliceFeatureValidated, function(x) {
                x[which(x$gene_id %in% gene_ids), "tran_id"]
            })
features <- unlist(features)

# Reduce dimension
marvel <- RunPCA(
            MarvelObject=marvel,  # Marvel object
            cell.types="all",     # Cell types to include for analysis
            n.cells=25,           # Min. no. of cells PSI != NA required 
            features=features,    # tran(script)_ids included for PCA
            point.size=2.5,       # Point size on PCA plot
            level="splicing",     # "gene"/"splicing" data for PCA analysis
            event.type="SE",      # Event type to include for PCA
            seed=1                # Ensures imputed values for NA PSIs are reproducible
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$PCA$PSI$Results$ind$coord[1:5, 1:5]
##                 Dim.1      Dim.2      Dim.3     Dim.4      Dim.5
## ERR1562084   5.862029  -4.225828  2.4701030  1.335188 -0.2593135
## ERR1562085  -1.790890  -5.923663  0.8156352  2.354919 -0.5013446
## ERR1562086   2.460916  -9.935629  1.7018188  3.771344 -0.7139902
## ERR1562087 -18.488769 -13.581346 -9.6644259  2.855215 -3.9251291
## ERR1562088  -8.939892 -10.288426  1.6504017 -1.247601 -4.0971045
marvel$PCA$PSI$Plot

Mutually excl. exons (MXE)

# Subset isoform IDs of non-DE genes for PCA
gene_ids <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj > 0.1), "gene_id"]
features <- sapply(marvel$SpliceFeatureValidated, function(x) {
                x[which(x$gene_id %in% gene_ids), "tran_id"]
            })
features <- unlist(features)

# Reduce dimension
marvel <- RunPCA(
            MarvelObject=marvel,  # Marvel object
            cell.types="all",     # Cell types to include for analysis
            n.cells=25,           # Min. no. of cells PSI != NA required 
            features=features,    # tran(script)_ids included for PCA
            point.size=2.5,       # Point size on PCA plot
            level="splicing",     # "gene"/"splicing" data for PCA analysis
            event.type="MXE",     # Event type to include for PCA
            seed=1                # Ensures imputed values for NA PSIs are reproducible
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$PCA$PSI$Results$ind$coord[1:5, 1:5]
##                Dim.1      Dim.2      Dim.3      Dim.4      Dim.5
## ERR1562084 -1.411695 -0.7888214  1.5289269 -1.8463752 -1.0326407
## ERR1562085  1.508032 -0.9409458  0.5516637  1.8432467 -1.4280783
## ERR1562086 -1.323370  1.0132772 -1.5103831 -0.3778736 -1.2492145
## ERR1562087  2.574057  1.5121222  0.5855348  1.2797351 -0.1115208
## ERR1562088  1.591145  0.8998149 -1.1776118  0.8183776  2.6583508
marvel$PCA$PSI$Plot

Alt. 5’ splice site (A5SS)

# Subset isoform IDs of non-DE genes for PCA
gene_ids <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj > 0.1), "gene_id"]
features <- sapply(marvel$SpliceFeatureValidated, function(x) {
                x[which(x$gene_id %in% gene_ids), "tran_id"]
            })
features <- unlist(features)

# Reduce dimension
marvel <- RunPCA(
            MarvelObject=marvel,  # Marvel object
            cell.types="all",     # Cell types to include for analysis
            n.cells=25,           # Min. no. of cells PSI != NA required 
            features=features,    # tran(script)_ids included for PCA
            point.size=2.5,       # Point size on PCA plot
            level="splicing",     # "gene"/"splicing" data for PCA analysis
            event.type="A5SS",    # Event type to include for PCA
            seed=1                # Ensures imputed values for NA PSIs are reproducible
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$PCA$PSI$Results$ind$coord[1:5, 1:5]
##                  Dim.1    Dim.2      Dim.3      Dim.4      Dim.5
## ERR1562084 -2.26807175 3.300224 -0.4268204  0.1910463  0.2697612
## ERR1562085  0.08285055 1.858291 -0.8740948  1.4349998 -0.8866429
## ERR1562086 -1.81062253 1.568028 -1.9505886 -1.1240744  1.1179956
## ERR1562087 10.95421615 5.297058 -0.3705043  5.7466718  0.9044293
## ERR1562088  4.65106902 8.557114  6.0320342  0.1978582 -5.8458444
marvel$PCA$PSI$Plot

Alt. 3’ splice site (A3SS)

# Subset isoform IDs of non-DE genes for PCA
gene_ids <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj > 0.1), "gene_id"]
features <- sapply(marvel$SpliceFeatureValidated, function(x) {
                x[which(x$gene_id %in% gene_ids), "tran_id"]
            })
features <- unlist(features)

# Reduce dimension
marvel <- RunPCA(
            MarvelObject=marvel,  # Marvel object
            cell.types="all",     # Cell types to include for analysis
            n.cells=25,           # Min. no. of cells PSI != NA required 
            features=features,    # tran(script)_ids included for PCA
            point.size=2.5,       # Point size on PCA plot
            level="splicing",     # "gene"/"splicing" data for PCA analysis
            event.type="A3SS",    # Event type to include for PCA
            seed=1                # Ensures imputed values for NA PSIs are reproducible
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$PCA$PSI$Results$ind$coord[1:5, 1:5]
##                 Dim.1     Dim.2      Dim.3     Dim.4      Dim.5
## ERR1562084   2.730793 -1.874929 -4.7391574 3.0032619  0.4880384
## ERR1562085  -1.101971 -7.007653 -3.6502803 1.9878255  1.8424534
## ERR1562086   2.359234 -3.325786 -0.9149622 2.2663237 -2.0935623
## ERR1562087 -11.221938 -2.489183 -2.3777300 2.4092784 -0.1892208
## ERR1562088  -5.373639 -6.682712 -2.1940072 0.1977603 -2.0217181
marvel$PCA$PSI$Plot

Retained-intron (RI)

# Subset isoform IDs of non-DE genes for PCA
gene_ids <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj > 0.1), "gene_id"]
features <- sapply(marvel$SpliceFeatureValidated, function(x) {
                x[which(x$gene_id %in% gene_ids), "tran_id"]
            })
features <- unlist(features)

# Reduce dimension
marvel <- RunPCA(
            MarvelObject=marvel,  # Marvel object
            cell.types="all",     # Cell types to include for analysis
            n.cells=25,           # Min. no. of cells PSI != NA required 
            features=features,    # tran(script)_ids included for PCA
            point.size=2.5,       # Point size on PCA plot
            level="splicing",     # "gene"/"splicing" data for PCA analysis
            event.type="RI",      # Event type to include for PCA
            seed=1                # Ensures imputed values for NA PSIs are reproducible
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$PCA$PSI$Results$ind$coord[1:5, 1:5]
##                 Dim.1    Dim.2      Dim.3      Dim.4     Dim.5
## ERR1562084 -2.4980040 3.231516 -1.5459522  1.9900605  1.197671
## ERR1562085 -1.3026228 5.107684  0.4156113  3.3185404  1.308196
## ERR1562086 -0.7584466 4.405877 -0.1917594  1.3998800  2.534308
## ERR1562087 11.5985216 6.689349 -0.3925192 -1.3654295 -3.546887
## ERR1562088  5.9427771 5.393905  1.6550813  0.2834983 -2.266737
marvel$PCA$PSI$Plot

All

# Subset isoform IDs of non-DE genes for PCA
gene_ids <- marvel$DE$Gene[which(marvel$DE$Gene$p.val.adj > 0.1), "gene_id"]
features <- sapply(marvel$SpliceFeatureValidated, function(x) {
                x[which(x$gene_id %in% gene_ids), "tran_id"]
            })
features <- unlist(features)

# Reduce dimension
marvel <- RunPCA(
            MarvelObject=marvel,  # Marvel object
            cell.types="all",     # Cell types to include for analysis
            n.cells=25,           # Min. no. of cells PSI != NA required 
            features=features,    # tran(script)_ids included for PCA
            point.size=2.5,       # Point size on PCA plot
            level="splicing",     # "gene"/"splicing" data for PCA analysis
            event.type="all",     # Event type to include for PCA
            seed=1                # Ensures imputed values for NA PSIs are reproducible
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$PCA$PSI$Results$ind$coord[1:5, 1:5]
##                Dim.1     Dim.2     Dim.3      Dim.4     Dim.5
## ERR1562084 -7.252956  6.880395 -3.029881 -1.1987400  3.076776
## ERR1562085  1.075935  9.116930 -1.487205  3.7795817  2.523780
## ERR1562086 -2.493326 11.387493  1.842431  0.4369814 -3.979436
## ERR1562087 26.547452 14.420885 -3.983069 -1.6757774 -1.087556
## ERR1562088 13.342924 15.275716 -2.986429 -0.4954586 -2.254429
marvel$PCA$PSI$Plot

Conclusions

             PSI profiles of non-DE genes for SE, A5SS, A3SS, and RI were able to distinguish iPSC and endoderm cells. MXE was not able to do so probably due to the small number of expressed events here, i.e. only < 30 events. Overall, splicing profile can distinguish cellular identity when gene expression profile can’t.

Modality analysis

             For each splicing event, its PSI distribution across a group of single cells can be assigned to 5 categories. The original 5 categories were conceived by Song et al (Mol Cell, 2017). They are included, excluded, bimodal, middle, and multimodal. Here, we refer to these 5 modalities are the basic modalities.
             MARVEL provides 2 novel approaches to the previously proposed modality classification:

(1) We further stratify the included and excluded modalities into primary or dispersed based the variance. Included and excluded modalities with small variance, i.e. PSI values very tightly clustered around 1 and 0, respectively, are further classified as primary sub-modality. On the other hand, included and excluded modalities with large variance, i.e. PSI values that cluster around 1 and 0, respectively, and with some values spread out across 0-1, are further classified as dispersed sub-modality. Here, we refer to the 5 main modalities + 2 sub-modalities as the extended modalities.

(2) We provide a means of distinguishing true from false bimodal distributions. False bimodals can be attributed, but not limited to, amplification bias, sequencing artifacts, or rare cell types within a presumed homogenous cell population. False bimodals are detected and re-assigned as included or excluded modality.

             The AssignModality function assigns modality for each splicing event. Briefly, the maximum likelihood estimation is used to derive the alpha and beta parameters of the beta distribution to stratify each splicing event into one of the modalities.

             The PropModality function tabulates and plots the proportion of each modality type for a given splicing event type or across splicing event types. Results are returned in marvel$Modality$Prop slot.

iPSC

Let us look into the modality assignment and pattern in iPSC.
# Assign modalities
marvel <- AssignModality(
            MarvelObject=marvel, # Marvel object
            cell.type="iPSC",    # Cell type to include for analysis
            n.cells=25,          # Min. no. of cells PSI != NA required 
            sigma.sq=0.001,      # Variance value below which sub-modality is primary,
                                                # above which sub-modality is dispersed
            bimodal.adjust=TRUE, # Detect and rectify false bimodals
            seed=1               # Ensures MLE model returns reproducible parameter values
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$Modality$Results[1:5, ]
##                                                                          tran_id
## 1 chrX:155090784:155090839:+@chrX:155099315:155099389:+@chrX:155116057:155116188
## 2 chrX:155090784:155090839:+@chrX:155116057:155116188:+@chrX:155116711:155116754
## 3 chrX:155033403:155033553:+@chrX:155046509:155046584:+@chrX:155051670:155051801
## 4 chrX:155046509:155046584:+@chrX:155047369:155047391:+@chrX:155051670:155051801
## 5 chrX:154399338:154399396:+@chrX:154399487:154399594:+@chrX:154399803:154399941
##   event_type            gene_id gene_short_name      gene_type n.cells modality
## 1         SE ENSG00000185515.14           BRCC3 protein_coding      43  Bimodal
## 2         SE ENSG00000185515.14           BRCC3 protein_coding      35 Included
## 3         SE ENSG00000165775.18          FUNDC2 protein_coding      63 Included
## 4         SE ENSG00000165775.18          FUNDC2 protein_coding      67 Excluded
## 5         SE ENSG00000147403.16           RPL10 protein_coding      83 Included
##         modality.var modality.bimodal.adj
## 1            Bimodal   Excluded.Dispersed
## 2   Included.Primary     Included.Primary
## 3 Included.Dispersed   Included.Dispersed
## 4   Excluded.Primary     Excluded.Primary
## 5   Included.Primary     Included.Primary
Brief explanation of the output table as follows:
(1) n.cells column indicates the no. of cells expressing the splicing event.
(2) modality column represents the 5 basic modalities.
(3) modality.var column represents the extended modalities as proposed by us.
(4) modality.bimodal.adj column represents the extended modalities whose false bimodals have been identified and reclassified into either included or excluded modalities.
Let’s investigate the proportion of each modality type.
# Plot basic modalities
marvel <- PropModality(
           MarvelObject=marvel,                    # Marvel object from AssignModality function
           modality.column="modality.bimodal.adj", # Modality column to tabulate proportion
           modality.type="basic",                  # Five original modalities by Song (Mol Cell,2017)
           event.type="all",                       # Event type(s) to include for analysis
           across.event.type=FALSE                 # Do not compare proportion across event types
           )
                       
plot.1 <- marvel$Modality$Prop$DoughnutChart$Plot

# Plot extended modalities
marvel <- PropModality(
           MarvelObject=marvel,                    
           modality.column="modality.bimodal.adj", 
           modality.type="extended",                  # Extended modalities by us
           event.type="all",                       
           across.event.type=FALSE                 
           )
                       
plot.2 <- marvel$Modality$Prop$DoughnutChart$Plot

grid.arrange(plot.1, plot.2, nrow=1)

Let’s investigate the proportion of each modality type across the different splicing event types
marvel <- PropModality(
           MarvelObject=marvel,                    # Marvel object from AssignModality function
           modality.column="modality.bimodal.adj", # Modality column to tabulate proportion
           modality.type="basic",                  # Five original modalities by Song (Mol Cell,2017)
           event.type="all",                       # Event type(s) to include for analysis
           across.event.type=TRUE,                 # Compare proportion across event types
           prop.test="chisq",                      # Statistical test: "chisq"/"fisher"
           prop.adj="fdr"                          # Adjust for multiple testing as per p.adjust
           )

marvel$Modality$Prop$BarChart$Plot

marvel$Modality$Prop$BarChart$Test
##     modality         p.val     p.val.adj
## 1   Included 3.233246e-165 1.616623e-164
## 2   Excluded 3.680623e-131 9.201558e-131
## 3    Bimodal  6.142436e-04  6.142436e-04
## 4     Middle  3.110818e-19  5.184696e-19
## 5 Multimodal  2.664265e-08  3.330331e-08

Endoderm

Let us look into the modality assignment and pattern in endoderm cells.
# Assign modalities
marvel <- AssignModality(
            MarvelObject=marvel, # Marvel object
            cell.type="Endoderm",# Cell type to include for analysis
            n.cells=25,          # Min. no. of cells PSI != NA required 
            sigma.sq=0.001,      # Variance value below which sub-modality is primary,
                                                # above which sub-modality is dispersed
            bimodal.adjust=TRUE, # Detect and rectify false bimodals
            seed=1               # Ensures MLE model returns reproducible parameter values
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$Modality$Results[1:5, ]
##                                                                          tran_id
## 1 chrX:155033403:155033553:+@chrX:155046509:155046584:+@chrX:155051670:155051801
## 2 chrX:155046509:155046584:+@chrX:155047369:155047391:+@chrX:155051670:155051801
## 3 chrX:154399338:154399396:+@chrX:154399487:154399594:+@chrX:154399803:154399941
## 4 chrX:154399803:154399941:+@chrX:154400464:154400626:+@chrX:154400702:154402339
## 5 chrX:154379197:154379566:+@chrX:154379690:154379794:+@chrX:154379942:154380019
##   event_type            gene_id gene_short_name      gene_type n.cells modality
## 1         SE ENSG00000165775.18          FUNDC2 protein_coding      28 Included
## 2         SE ENSG00000165775.18          FUNDC2 protein_coding      31  Bimodal
## 3         SE ENSG00000147403.16           RPL10 protein_coding      53 Included
## 4         SE ENSG00000147403.16           RPL10 protein_coding      53 Included
## 5         SE ENSG00000102119.10             EMD protein_coding      30 Included
##         modality.var modality.bimodal.adj
## 1 Included.Dispersed   Included.Dispersed
## 2            Bimodal   Excluded.Dispersed
## 3   Included.Primary     Included.Primary
## 4   Included.Primary     Included.Primary
## 5   Included.Primary     Included.Primary
Brief explanation of the output table as follows:
(1) n.cells column indicates the no. of cells expressing the splicing event.
(2) modality column represents the 5 basic modalities.
(3) modality.var column represents the extended modalities as proposed by us.
(4) modality.bimodal.adj column represents the extended modalities whose false bimodals have been identified and reclassified into either included or excluded modalities.
Let’s investigate the proportion of each modality type.
# Plot basic modalities
marvel <- PropModality(
           MarvelObject=marvel,                    # Marvel object from AssignModality function
           modality.column="modality.bimodal.adj", # Modality column to tabulate proportion
           modality.type="basic",                  # Five original modalities by Song (Mol Cell,2017)
           event.type="all",                       # Event type(s) to include for analysis
           across.event.type=FALSE                 # Do not compare proportion across event types
           )
                       
plot.1 <- marvel$Modality$Prop$DoughnutChart$Plot

# Plot extended modalities
marvel <- PropModality(
           MarvelObject=marvel,                    
           modality.column="modality.bimodal.adj", 
           modality.type="extended",               # Extended modalities by us
           event.type="all",                       
           across.event.type=FALSE                 
           )
                       
plot.2 <- marvel$Modality$Prop$DoughnutChart$Plot

grid.arrange(plot.1, plot.2, nrow=1)

Let’s investigate the proportion of each modality type across the different splicing event types
marvel <- PropModality(
           MarvelObject=marvel,                    # Marvel object from AssignModality function
           modality.column="modality.bimodal.adj", # Modality column to tabulate proportion
           modality.type="basic",                  # Five original modalities by Song (Mol Cell,2017)
           event.type="all",                       # Event type(s) to include for analysis
           across.event.type=TRUE,                 # Compare proportion across event types
           prop.test="chisq",                      # Statistical test: "chisq"/"fisher"
           prop.adj="fdr"                          # Adjust for multiple testing as per p.adjust
           )

marvel$Modality$Prop$BarChart$Plot

marvel$Modality$Prop$BarChart$Test
##     modality        p.val    p.val.adj
## 1   Included 5.954299e-75 2.977150e-74
## 2   Excluded 7.014251e-57 1.753563e-56
## 3    Bimodal 6.995854e-04 6.995854e-04
## 4     Middle 3.167475e-07 5.279125e-07
## 5 Multimodal 1.042874e-05 1.303592e-05

Conclusions

(1) When we look at each modality’s overall proportion (doughtnut plot), we see that the included (red) and excluded (blue) modalities are the most frequent modalities. This is consistent with previous studies that reported most genes would have only one isoform detected at single-cell level. Here, included and excluded modalities would reflect that most genes either include or exclude the alternative exon, but rarely express both isoforms in each single cell (i.e. middle and multimodal modalities).

(2) Moreover, both primary and dispersed sub-modalities contribute almost equally to both included and excluded modalities. This suggest that our sub-modality classification identifies major sub-types of included and excluded modalities.

(3) When we look at the each modality’s overall proportion across the different splicing event types, we can see certain modalities are more frequent in a specific event type compared to others. The most prominent observation is that the excluded modality is most frequent in RI. This is consistent with the role of RI in subjecting transcripts to nonsense-mediated decay (NMD) as a means of regulating gene expression (Smart et al, 2018).

Differential splicing analysis

Differential splicing analysis

       The aim of single-cell differential gene analysis is to determine if there is a statistically significant difference between the means of different group of cells. On the other hand, variance is a measure of uncertainty. The larger the variance, the lower the statistical power to detect a significant difference in means between groups of cells.

       In contrast, comparing the means of the PSI distributions between groups of cells alone is insufficient. For example, bimodal, middle, and multimodal all have PSI \(\approx\) 0.5 but have clearly different PSI distributions and therefore are categorized as different modalities. Moreover, the middle modality has different variance compared to both bimodal and multimodal modalities. Hence, in splicing analysis, variance is informative.

       Therefore, the aim of single-cell differential splicing analysis is to determine if the distribution of PSI values of one group of cells is significantly different from another group of cells. A suitable statistical method for comparing the PSI distribution between groups of single cell is the Kolmogorov-Smirnov test.

       The differences in PSI distribution for each splicing event between groups of single cells can be assessed using CompareValues function together with level=splicing argumennt. Results are returned in the marvel$DE$PSI slot
marvel <- CompareValues(
            MarvelObject=marvel,               # Marvel object
            cell.types=c("iPSC", "Endoderm"),  # Cell types to analyse
            n.cells=25,                        # Min. no. of cells PSI != NA required 
            method="ks",                       # "ks"/"wilcox"/"t.test"
            method.adj="fdr",                  # Adjust for multiple testing as per p.adjust
            level="splicing"                   # "gene"/"splicing" data to analyse
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$DE$PSI[1:5, ]
##                                                                       tran_id
## 1 chr15:24962114:24962209:+@chr15:24967029:24967152:+@chr15:24967932:24968082
## 2 chr15:24962114:24962209:+@chr15:24967029:24967240:+@chr15:24967932:24968082
## 3 chr10:78037194:78037304:+@chr10:78037439:78037441:+@chr10:78040204:78040225
## 4                           chr13:43059394:43059714:+@chr13:43062190:43062295
## 5               chr3:197950190:197950221|197950299:+@chr3:197950936:197950978
##   event_type            gene_id gene_short_name      gene_type n.cells.g1
## 1         SE ENSG00000128739.22           SNRPN protein_coding         83
## 2         SE ENSG00000128739.22           SNRPN protein_coding         83
## 3         SE ENSG00000138326.20           RPS24 protein_coding         83
## 4         RI  ENSG00000120675.6         DNAJC15 protein_coding         67
## 5       A5SS ENSG00000182899.17          RPL35A protein_coding         83
##   n.cells.g2      mean.g1     mean.g2   mean.diff p.val p.val.adj
## 1         53 0.1003557547 0.009585074 -0.09077068     0         0
## 2         53 0.0655082154 0.011043286 -0.05446493     0         0
## 3         53 0.1293116116 0.013734360 -0.11557725     0         0
## 4         53 0.0001407511 0.122010454  0.12186970     0         0
## 5         53 0.1930818781 0.054302037 -0.13877984     0         0

Pathway enrichment analysis

             Pathway enrichment analysis is useful to assess if a set of differentially spliced genes consists of genes with independent functions or genes sharing similar pathways, and therefore have related functions. The latter is commonly known as gene sets/gene signatures/gene families. We can detect and visualise the biological pathways of genes that are differentially spliced using the BioPathways function.
marvel <- BioPathways(
           MarvelObject=marvel,   # Marvel object from CompareValues function()
           psi.de.sig=0.05,       # Adjusted p-value below which splicing event 
                                             # considered differentially spliced
           method.adjust="fdr",   # Adjust for multiple testing as per p.adjust
           min.genes=10,          # Min.no.of differentially spliced genes required
           p.val.adj.return=0.05, # Return pathways with adjusted p-value below this value
           plot.top.n=10          # Plot these top most significant pathways
           )
marvel$DE$BioPathways[1:5,]
##       GOBPID       Pvalue OddsRatio ExpCount Count Size
## 1 GO:0006413 4.449963e-10  4.549807 17.58498    42   77
## 2 GO:0000184 4.463616e-09  4.025353 18.59510    42   81
## 3 GO:0019083 9.978636e-09  4.024699 17.66741    40   77
## 4 GO:0006614 1.765970e-08  3.898635 17.93769    40   78
## 5 GO:0072594 3.213229e-07  2.540323 35.64540    62  155
##                                                                  Term
## 1                                            translational initiation
## 2 nuclear-transcribed mRNA catabolic process, nonsense-mediated decay
## 3                                                 viral transcription
## 4         SRP-dependent cotranslational protein targeting to membrane
## 5                  establishment of protein localization to organelle
##      p.val.adj
## 1 1.363024e-06
## 2 6.836029e-06
## 3 1.018819e-05
## 4 1.352291e-05
## 5 1.931496e-04
##                                                                                                                                                                                                                                                                                                                                                                                 genes.hit
## 1                                                                    EIF4A1|EIF4A2|EIF4G2|EIF6|RPL10A|NPM1|POLR2G|RPL3|RPL5|RPL7A|RPL8|RPL9|RPL10|RPL13|RPL17|RPL18|RPL21|RPL24|RPL26|RPL30|RPL31|RPL35A|RPL37A|RPL38|RPL39|RPL41|RPL36A|RPLP0|RPLP2|RPS2|RPS6|RPS11|RPS13|RPS15A|RPS16|RPS17|RPS19|RPS20|RPS21|RPS24|RPS25|RPS27A|RPS28|RPS29|UBA52|EIF4H|DENR|CDC123|RPL35|PABPC1|EIF3K
## 2                                                                                                                     RPL10A|RPL3|RPL5|RPL7A|RPL8|RPL9|RPL10|RPL13|RPL17|RPL18|RPL21|RPL24|RPL26|RPL30|RPL31|RPL35A|RPL37A|RPL38|RPL39|RPL41|RPL36A|RPLP0|RPLP2|RPS2|RPS6|RPS11|RPS13|RPS15A|RPS16|RPS17|RPS19|RPS20|RPS21|RPS24|RPS25|RPS27A|RPS28|RPS29|UBA52|RBM8A|RPL35|PABPC1|MAGOHB
## 3                                                                                                                   RPL10A|POLR2G|POLR2H|RPL3|RPL5|RPL7A|RPL8|RPL9|RPL10|RPL13|RPL17|RPL18|RPL21|RPL24|RPL26|RPL30|RPL31|RPL35A|RPL37A|RPL38|RPL39|RPL41|RPL36A|RPLP0|RPLP2|RPS2|RPS6|RPS11|RPS13|RPS15A|RPS16|RPS17|RPS19|RPS20|RPS21|RPS24|RPS25|RPS27A|RPS28|RPS29|SUPT4H1|UBA52|RPL35
## 4                                                                                                                                         RPL10A|RPL3|RPL5|RPL7A|RPL8|RPL9|RPL10|RPL13|RPL17|RPL18|RPL21|RPL24|RPL26|RPL30|RPL31|RPL35A|RPL37A|RPL38|RPL39|RPL41|RPL36A|RPLP0|RPLP2|RPS2|RPS6|RPS11|RPS13|RPS15A|RPS16|RPS17|RPS19|RPS20|RPS21|RPS24|RPS25|RPS27A|RPS28|RPS29|UBA52|RPL35
## 5 CCT6A|HSPA4|IDH1|TNPO1|RPL10A|RAN|RPL3|RPL5|RPL7A|RPL8|RPL9|RPL10|RPL13|RPL17|RPL18|RPL21|RPL24|RPL26|RPL30|RPL31|RPL35A|RPL37A|RPL38|RPL39|RPL41|RPL36A|RPLP0|RPLP2|RPS2|RPS6|RPS11|RPS13|RPS15A|RPS16|RPS17|RPS19|RPS20|RPS21|RPS24|RPS25|RPS27A|RPS28|RPS29|TERF1|UBA52|UBE2D3|YWHAB|YWHAE|YWHAZ|USP9X|RPL35|SEC61G|TIMM8B|DNAJC15|FIS1|TOMM7|UBL5|RPAIN|ATP5IF1|DNAJC19|ROMO1|TOMM5
marvel$DE$BioPathwaysPlot

Detecting isoform switch

             An isoform switch has occured when a gene expression value is not significantly different between groups of cells, but its alternative exon(s) is differentially spliced (Smith et al., 2019). MARVEL can detect these genes and their corresponding splicing events. Moreover, MARVEL will additionally identify differentially expressed genes whose mean expression values changes in the same or opposite direction with their mean PSI values. Use the IsoSwitch function for this purpose. Results are returned in marvel$DE$Cor* slot.

Isoform switch

marvel <- IsoSwitch(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            psi.de.sig=0.05,                   # Adjusted p-value below which the splicing
                                                  # event is considered differentially spliced
            cell.types=c("iPSC", "Endoderm"),  # Cell types to analyse
            method="wilcox",                   # Statistical test for differential gene 
                                                  # expression analysis
            method.adj="fdr",                  # Adjust for multiple testing as per p.adjust
            gene.de.sig=0.05                   # Adjusted p-value below which the gene
                                                  # is considered differentially expressed
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$DE$Cor[1:5, ]
##                                                                       tran_id
## 1 chr15:24962114:24962209:+@chr15:24967029:24967152:+@chr15:24967932:24968082
## 2 chr15:24962114:24962209:+@chr15:24967029:24967240:+@chr15:24967932:24968082
## 3 chr10:78037194:78037304:+@chr10:78037439:78037441:+@chr10:78040204:78040225
## 4                           chr13:43059394:43059714:+@chr13:43062190:43062295
## 5               chr3:197950190:197950221|197950299:+@chr3:197950936:197950978
##   event_type            gene_id gene_short_name      gene_type n.cells.g1
## 1         SE ENSG00000128739.22           SNRPN protein_coding         83
## 2         SE ENSG00000128739.22           SNRPN protein_coding         83
## 3         SE ENSG00000138326.20           RPS24 protein_coding         83
## 4         RI  ENSG00000120675.6         DNAJC15 protein_coding         67
## 5       A5SS ENSG00000182899.17          RPL35A protein_coding         83
##   n.cells.g2      mean.g1     mean.g2   mean.diff p.val p.val.adj
## 1         53 0.1003557547 0.009585074 -0.09077068     0         0
## 2         53 0.0655082154 0.011043286 -0.05446493     0         0
## 3         53 0.1293116116 0.013734360 -0.11557725     0         0
## 4         53 0.0001407511 0.122010454  0.12186970     0         0
## 5         53 0.1930818781 0.054302037 -0.13877984     0         0
##     psi.gene.cor
## 1          Mixed
## 2          Mixed
## 3          Mixed
## 4 Same direction
## 5     Iso-switch
marvel$DE$CorPlot

marvel$DE$CorProp
##         psi.gene.cor freq      pct
## 4     Same direction  203 40.11858
## 3 Opposite direction   88 17.39130
## 1         Iso-switch  114 22.52964
## 2              Mixed  101 19.96047
# Example of a gene with isoform switch
. <- unique(marvel$DE$Cor[which(marvel$DE$Cor$psi.gene.cor=="Iso-switch"), ])
gene_id <- .$gene_id[1]
tran_id <- .$tran_id[1]

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=gene_id,                   # gene_id to plot
            maintitle="gene_short_name",       # Column to use as plot title as per GeneFeature
            level="gene"                       # "gene"/"splicing" data to plot
            )

plot.1 <- marvel$adhocPlot$Gene

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=tran_id,                   # tran(script)_id to plot
            maintitle="gene_short_name",       # Column to use as plot title as per ValidatedSpliceFeature
            xlabels.size=9,                    # Font size for x-axis labels
            level="splicing",                  # "gene"/"splicing" data to plot
            n.cells=25,                        # For modality assignment (see AssignModality function above)
            sigma.sq=0.001,                    
            bimodal.adjust=TRUE,               
            seed=1,                            
            modality.column="modality.bimodal.adj" 
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
plot.2 <- marvel$adhocPlot$PSI

grid.arrange(plot.1, plot.2, nrow=1)

Positive correlation

# Example of a gene whose mean changes in the SAME direction as mean PSI value
. <- unique(marvel$DE$Cor[which(marvel$DE$Cor$psi.gene.cor=="Same direction"), ])
gene_id <- .$gene_id[1]
tran_id <- .$tran_id[1]

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=gene_id,                   # gene_id to plot
            maintitle="gene_short_name",       # Column to use as plot title as per GeneFeature
            level="gene"                       # "gene"/"splicing" data to plot
            )

plot.1 <- marvel$adhocPlot$Gene

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=tran_id,                   # tran(script)_id to plot
            maintitle="gene_short_name",       # Column to use as plot title as per ValidatedSpliceFeature
            xlabels.size=9,                    # Font size for x-axis labels
            level="splicing",                  # "gene"/"splicing" data to plot
            n.cells=25,                        # For modality assignment (see AssignModality function above)
            sigma.sq=0.001,                    
            bimodal.adjust=TRUE,               
            seed=1,                            
            modality.column="modality.bimodal.adj"
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
plot.2 <- marvel$adhocPlot$PSI

grid.arrange(plot.1, plot.2, nrow=1)

Negative correlation

# Example of a gene whose mean changes in the OPPOSITE direction as mean PSI value
. <- unique(marvel$DE$Cor[which(marvel$DE$Cor$psi.gene.cor=="Opposite direction"), ])
gene_id <- .$gene_id[1]
tran_id <- .$tran_id[1]

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=gene_id,                   # gene_id to plot
            maintitle="gene_short_name",       # Column to use as plot title as per GeneFeature
            level="gene"                       # "gene"/"splicing" data to plot
            )

plot.1 <- marvel$adhocPlot$Gene

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=tran_id,                   # tran(script)_id to plot
            maintitle="gene_short_name",       # Column to use as plot title as per ValidatedSpliceFeature
            xlabels.size=9,                    # Font size for x-axis labels
            level="splicing",                  # "gene"/"splicing" data to plot
            n.cells=25,                        # For modality assignment (see AssignModality function above)
            sigma.sq=0.001,                    
            bimodal.adjust=TRUE,               
            seed=1,                            
            modality.column="modality.bimodal.adj"
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
plot.2 <- marvel$adhocPlot$PSI

grid.arrange(plot.1, plot.2, nrow=1)

Detecting modality dynamics

             Differential analysis of PSI values detects quantitative changes in splicing patterns between groups of cells. On the other hand, modality analysis detects qualitative changes in splicing patterns between groups of cells. MARVEL can detect changes in modalities between groups of cells and stratify these changes into the following categories:
(1) Explicit: When modality changes between one of the five main modalities, e.g. included to multimodal.
(2) Implicit: When modality changes between primary and dispersed sub-modalities, e.g. included-primary to included-dispersed.
(3) Restricted: No modality change, e.g. included to included.
Use the ModalityChange function for this purpose.
marvel <- ModalityChange(
            MarvelObject=marvel,              # Marvel object from CompareValues function()
            psi.de.sig=0.05,                  # Adjusted p-value below which the splicing
                                                  # event is considered differentially spliced
            cell.types=c("iPSC", "Endoderm"), # Cell types to analyse
            n.cells=25,                       # For modality assignment (see AssignModality function above)
            sigma.sq=0.001,                   
            bimodal.adjust=TRUE,             
            seed=1,                           
            modality.column="modality.bimodal.adj" #
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
marvel$DE$ModalityProp
##   modality.change freq      pct
## 1        Explicit   65 12.84585
## 2        Implicit   75 14.82213
## 3      Restricted  366 72.33202
marvel$DE$ModalityPlot

Explicit change

# Example of explicit modality change
tran_ids <- marvel$DE$Modality[which(marvel$DE$Modality["modality.change"]=="Explicit"), "tran_id"]

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=tran_ids[1],               # tran(script)_ids to plot
            maintitle="gene_short_name",       # Column to use as plot title as per ValidatedSpliceFeature
            xlabels.size=5,                    # Font size for x-axis labels
            level="splicing",                  # "gene"/"splicing" data to plot
            n.cells=25,                        # For modality assignment (see AssignModality function above)
            sigma.sq=0.001,                   
            bimodal.adjust=TRUE,               
            seed=1,                            
            modality.column="modality.bimodal.adj" 
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
plot.1 <- marvel$adhocPlot$PSI

Implicit change

# Example of implicit modality change
tran_ids <- marvel$DE$Modality[which(marvel$DE$Modality["modality.change"]=="Implicit"), "tran_id"]

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=tran_ids[1],               # tran(script)_ids to plot
            maintitle="gene_short_name",       # Column to use as plot title as per ValidatedSpliceFeature
            xlabels.size=5,                    # Font size for x-axis labels
            level="splicing",                  # "gene"/"splicing" data to plot
            n.cells=25,                        # For modality assignment (see AssignModality function above)
            sigma.sq=0.001,                    
            bimodal.adjust=TRUE,               
            seed=1,                            
            modality.column="modality.bimodal.adj" 
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
plot.2 <- marvel$adhocPlot$PSI

Restricted change

# Example of restricted modality change
tran_ids <- marvel$DE$Modality[which(marvel$DE$Modality["modality.change"]=="Restricted"), "tran_id"]

marvel <- PlotValues(
            MarvelObject=marvel,               # Marvel object from CompareValues function()
            cell.types=c("iPSC", "Endoderm"),  # Cell types to plot
            feature=tran_ids[3],               # tran(script)_ids to plot
            maintitle="gene_short_name",       # Column to use as plot title as per ValidatedSpliceFeature
            xlabels.size=5,                    # Font size for x-axis labels
            level="splicing",                  # "gene"/"splicing" data to plot
            n.cells=25,                        # For modality assignment (see AssignModality function above)
            sigma.sq=0.001,                    
            bimodal.adjust=TRUE,               
            seed=1,                            
            modality.column="modality.bimodal.adj"
            )
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
## [1] "Checking... Matrix column (sample) names match sample metadata"
## [1] "Checking... Matrix row (feature) names match feature metadata"
plot.3 <- marvel$adhocPlot$PSI

grid.arrange(plot.1, plot.2, plot.3, nrow=1)

             Interestingly, most splicing events are differentially spliced but have no change in modality. This implies that the change in PSI values for these splicing events between iPSC and endoderm while significant, is also small, and therefore do not lead to change in modality.
             It is conceivable that genes with restricted, or even, implicit modality changes consist of genes that are more conserved, and therefore less likely to undergo explicit modality change. Genes encoding for ribosomal proteins are known to be conserved across many species, including humans. Let’s check the proportion of these highly-conserved genes across the three types of modality changes.
# Tabulate % per modality change
changes <- c("Explicit", "Implicit", "Restricted")

pct.conserved <- NULL

for(i in 1:length(changes)) {

    # Subset relevant modality change
    . <- marvel$DE$Modality[which(marvel$DE$Modality["modality.change"]==changes[i]), ]
    
    # Subset ribosomal genes
    gene_short_names.1 <- unique(grep("^RPL", .$gene_short_name, value=TRUE))
    gene_short_names.2 <- unique(grep("^RPS", .$gene_short_name, value=TRUE))
    
    # Compute % of ribosomal genes
    n.conserved <- length(c(gene_short_names.1, gene_short_names.2))
    pct.conserved[i] <- n.conserved/length(unique(.$gene_short_name)) * 100
    
}

results <- data.frame("modality.change"=changes,
                      "pct"=pct.conserved,
                      stringsAsFactors=FALSE
                      )
    
# Barplot
barplot(pct ~ modality.change, results)

Conclusions

(1) Differential splicing analysis identified differentially spliced genes. Pathway enrichment analysis identified several pathways enriched among these differentially spliced genes. This suggest that differentially spliced genes are not a random set of genes with independent functions. Rather, these genes have related functions, linked by common pathways, and are coordinately regulated through alternative splicing.

(2) Studying changes in PSI values and their corresponding changes in gene expression profile revealed only one-thirds of splicing event changes in the same direction as its corresponding gene expression profile. Hence, it is not always possible to infer direction of PSI value changes from changes in gene expression profile.

(3) Studying modality changes among differentially spliced events when iPSC differentiate into endoderm cells revealed majority of splicing events either had implicit or restricted modality changes. Interestingly, genes with implicit or restricted modality changes are enriched for highly conserved genes. This implies that highly conserved genes are less likely to be regulated (perturbed) by major (explicit) splicing changes.

Comparison with published tools

* BRIE incorporates sequence features to model PSI values. These features were identified and trained on human and mouse data only.

Companion tool

             From this tutorial, we identified over 500 differentially spliced events. We would like to introduce VALERIE (Visulazing ALternative splicing Events from RIbonucleic acid Experiments) - a visualisation platform for visualising alternative splicing events at single-cell resolution. The tutorial for using VALERIE for investigating these differentially spliced events can be found here: https://wenweixiong.github.io/MARVEL

References

    Buen Abad Najar, C. F., Yosef, N., & Lareau, L. F. (2020). Coverage-dependent bias creates the appearance of binary splicing in single cells. Elife, 9. doi:10.7554/eLife.54603
    Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., . . . Gingeras, T. R. (2013). STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15-21. doi:10.1093/bioinformatics/bts635
    Huang, Y., & Sanguinetti, G. (2017). BRIE: transcriptome-wide splicing quantification in single cells. Genome Biol, 18(1), 123. doi:10.1186/s13059-017-1248-5
    Kovaka, S., Zimin, A. V., Pertea, G. M., Razaghi, R., Salzberg, S. L., & Pertea, M. (2019). Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol, 20(1), 278. doi:10.1186/s13059-019-1910-1
    Li, B., & Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12, 323. doi:10.1186/1471-2105-12-323
    Linker, S. M., Urban, L., Clark, S. J., Chhatriwala, M., Amatya, S., McCarthy, D. J., . . . Bonder, M. J. (2019). Combined single-cell profiling of expression and DNA methylation reveals splicing regulation and heterogeneity. Genome Biol, 20(1), 30. doi:10.1186/s13059-019-1644-0
    Shen, S., Park, J. W., Lu, Z. X., Lin, L., Henry, M. D., Wu, Y. N., . . . Xing, Y. (2014). rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci U S A, 111(51), E5593-5601. doi:10.1073/pnas.1419161111
    Smart, A. C., Margolis, C. A., Pimentel, H., He, M. X., Miao, D., Adeegbe, D., . . . Van Allen, E. M. (2018). Intron retention is a source of neoepitopes in cancer. Nat Biotechnol, 36(11), 1056-1058. doi:10.1038/nbt.4239
    Smith, M. A., Choudhary, G. S., Pellagatti, A., Choi, K., Bolanos, L. C., Bhagat, T. D., . . . Starczynowski, D. T. (2019). U2AF1 mutations induce oncogenic IRAK4 isoforms and activate innate immune pathways in myeloid malignancies. Nat Cell Biol, 21(5), 640-650. doi:10.1038/s41556-019-0314-5
    Song, Y., Botvinnik, O. B., Lovci, M. T., Kakaradov, B., Liu, P., Xu, J. L., & Yeo, G. W. (2017). Single-Cell Alternative Splicing Analysis with Expedition Reveals Splicing Dynamics during Neuron Differentiation. Mol Cell, 67(1), 148-161 e145. doi:10.1016/j.molcel.2017.06.003
    Wen, W. X., Mead, A. J., & Thongjuea, S. (2020). Technological advances and computational approaches for alternative splicing analysis in single cells. Comput Struct Biotechnol J, 18, 332-343. doi:10.1016/j.csbj.2020.01.009
    Wen, W. X., Mead, A. J., & Thongjuea, S. (2020b). VALERIE: Visual-based inspection of alternative splicing events at single-cell resolution. PLoS Comput Biol, 16(9), e1008195. doi:10.1371/journal.pcbi.1008195